Cocktail Party Processing via Structured Prediction
نویسندگان
چکیده
While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility. The proposed system substantially outperforms existing ones in a variety of noises.
منابع مشابه
Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis
In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNN...
متن کاملProbabilistic Binary-Mask Cocktail-Party Source Separation in a Convolutional Deep Neural Network
Separation of competing speech is a key challenge in signal processing and a feat routinely performed by the human auditory brain. A long standing benchmark of the spectrogram approach to source separation is known as the ideal binary mask. Here, we train a convolutional deep neural network, on a twospeaker cocktail party problem, to make probabilistic predictions about binary masks. Our result...
متن کاملA Biologically Motivated Solution to the Cocktail Party Problem
We present a new approach to the cocktail party problem that uses a cortronic artificial neural network architecture (Hecht-Nielsen, 1998) as the front end of a speech processing system. Our approach is novel in three important respects. First, our method assumes and exploits detailed knowledge of the signals we wish to attend to in the cocktail party environment. Second, our goal is to provide...
متن کاملL'amorçage sémantique masqué en situation de cocktail party (Masked semantic priming in cocktail party situation) [in French]
________________________________________________________________________________________________________ Masked semantic priming in cocktail party situation The present study aimed at testing automatic semantic processing in the auditory modality using the cocktail party situation. Participants had to perform a lexical decision task on a target item embedded in a multi-talker babble. This babbl...
متن کاملImproved Cocktail - Party Processing
The human auditory system is able to focus on one speech signal and ignore other speech signals in an auditory scene where several conversations are taking place. This ability of the human auditory system is referred to as the “cocktail-party effect”. This property of human hearing is partly made possible by binaural listening. Interaural time differences (ITDs) and interaural level differences...
متن کامل